Correlation Clustering with Low-Rank Matrices
نویسندگان
چکیده
Correlation clustering is a technique for aggregating data based on qualitative information about which pairs of objects are labeled ‘similar’ or ‘dissimilar.’ Because the optimization problem is NP-hard, much of the previous literature focuses on finding approximation algorithms. In this paper we explore how to solve the correlation clustering objective exactly when the data to be clustered can be represented by a low-rank matrix. We prove in particular that correlation clustering can be solved in polynomial time when the underlying matrix is positive semidefinite with small constant rank, but that the task remains NP-hard in the presence of even one negative eigenvalue. Based on our theoretical results, we develop an algorithm for efficiently “solving” lowrank positive semidefinite correlation clustering by employing a procedure for zonotope vertex enumeration. We demonstrate the effectiveness and speed of our algorithm by using it to solve several clustering problems on both synthetic and real-world data.
منابع مشابه
Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering
Multi-view spectral clustering, which aims at yielding an agreement or consensus data objects grouping across multi-views with their graph laplacian matrices, is a fundamental clustering problem. Among the existing methods, Low-Rank Representation (LRR) based method is quite superior in terms of its effectiveness, intuitiveness and robustness to noise corruptions. However, it aggressively tries...
متن کاملLow-rank matrix reconstruction and clustering via approximate message passing
We study the problem of reconstructing low-rank matrices from their noisy observations. We formulate the problem in the Bayesian framework, which allows us to exploit structural properties of matrices in addition to low-rankedness, such as sparsity. We propose an efficient approximate message passing algorithm, derived from the belief propagation algorithm, to perform the Bayesian inference for...
متن کاملStatistica Sinica 12(2002), 7-29 GENERALIZED ASSOCIATION PLOTS: INFORMATION VISUALIZATION VIA ITERATIVELY GENERATED CORRELATION MATRICES
Given a p-dimensional proximity matrix Dp p, a sequence of correlation matrices, R = (R; R; : : :), is iteratively formed from it. Here R is the correlation matrix of the original proximity matrix D and R is the correlation matrix of R , n > 1. This sequence was rst introduced by McQuitty (1968), Breiger, Boorman and Arabie (1975) developed an algorithm, CONCOR, based on their rediscovery of it...
متن کاملTensor Sparse and Low-Rank based Submodule Clustering Method for Multi-way Data
A new submodule clustering method via sparse and lowrank representation for multi-way data is proposed in this paper. Instead of reshaping multi-way data into vectors, this method maintains their natural orders to preserve data intrinsic structures, e.g., image data kept as matrices. To implement clustering, the multi-way data, viewed as tensors, are represented by the proposed tensor sparse an...
متن کاملHigh-Rank Matrix Completion and Clustering under Self-Expressive Models
We propose efficient algorithms for simultaneous clustering and completion of incomplete high-dimensional data that lie in a union of low-dimensional subspaces. We cast the problem as finding a completion of the data matrix so that each point can be reconstructed as a linear or affine combination of a few data points. Since the problem is NP-hard, we propose a lifting framework and reformulate ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017